8 research outputs found
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
VAST Challenge 2015: Mayhem at Dinofun World
A fictitious amusement park and a larger-than-life hometown football hero provided participants in the VAST Challenge 2015 with an engaging yet complex storyline and setting in which to analyze movement and communication patterns. The datasets for the 2015 challenge were large—averaging nearly 10 million records per day over a three day period—with a simple straightforward structured format. The simplicity of the format belied a complex wealth of features contained in the data that needed to be discovered and understood to solve the tasks and questions that were posed. Two Mini-Challenges and a Grand Challenge compose the 2015 competition. Mini-Challenge 1 contained structured location and date-time data for park visitors, against which participants were to discern groups and their activities. MiniChallenge 2 contained structured communication data consisting of metadata about time-stamped text messages sent between park visitors. The Grand Challenge required participants to use both movement and communication data to hypothesize when a crime was committed and identify the most likely suspects from all the park visitors. The VAST Challenge 2015 received 74 submissions, and the datasets were downloaded, at least partially, from 26 countries
Visualization Evaluation for Cyber Security: Trends and Future Directions
The Visualization for Cyber Security research community (VizSec) addresses longstanding challenges in cyber security by adapting and evaluating information visualization techniques with application to the cyber security domain. This research effort has created many tools and techniques that could be applied to improve cyber security, yet the community has not yet established unified standards for evaluating these approaches to predict their operational validity. In this paper, we survey and categorize the evaluation metrics, components and techniques that have been utilized in the past decade of VizSec research literature. We also discuss existing methodological gaps in evaluating visualization in cyber security, and suggest potential avenues for future re- search in order to help establish an agenda for advancing the state-of-the-art in evaluating cyber security visualization
GraphChallenge.org: Raising the Bar on Graph Analytic Performance
The rise of graph analytic systems has created a need for new ways to measure
and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE
Graph Challenge has been developed to provide a well-defined community venue
for stimulating research and highlighting innovations in graph analysis
software, hardware, algorithms, and systems. GraphChallenge.org provides a wide
range of pre-parsed graph data sets, graph generators, mathematically defined
graph algorithms, example serial implementations in a variety of languages, and
specific metrics for measuring performance. Graph Challenge 2017 received 22
submissions by 111 authors from 36 organizations. The submissions highlighted
graph analytic innovations in hardware, software, algorithms, systems, and
visualization. These submissions produced many comparable performance
measurements that can be used for assessing the current state of the art of the
field. There were numerous submissions that implemented the triangle counting
challenge and resulted in over 350 distinct measurements. Analysis of these
submissions show that their execution time is a strong function of the number
of edges in the graph, , and is typically proportional to for
large values of . Combining the model fits of the submissions presents a
picture of the current state of the art of graph analysis, which is typically
edges processed per second for graphs with edges. These results
are times faster than serial implementations commonly used by many graph
analysts and underscore the importance of making these performance benefits
available to the broader community. Graph Challenge provides a clear picture of
current graph analysis systems and underscores the need for new innovations to
achieve high performance on very large graphs.Comment: 7 pages, 6 figures; submitted to IEEE HPEC Graph Challenge. arXiv
admin note: text overlap with arXiv:1708.0686
Collaborative Data Analysis and Discovery for Cyber Security
ABSTRACT In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others. In current generation cyber work, tools limit analyst's ability to collaborate, often relying on individual record keeping which hinders their ability to reflect on their own work and transition analytic insights to others. While online collaboration technologies have been shown to encourage and facilitate information sharing and group decision making in multiple contexts, no such technology exists today in cyber. Using visualization and annotation, CARINA leverages conversation and ad hoc thought to coordinate decisions across an organization. CARINA incorporates features designed to incentivize positive information-sharing behaviors, and provides a framework for incorporating recommendation engines and other analytics to guide analysts in the discovery of related data or analyses. In this paper, we present the user research that informed the development of CARINA, discuss the functionality of the system, and outline potential use cases. We also discuss future research trajectories and implications for cyber researchers and practitioners
Eventpad: Rapid malware analysis and reverse engineering using visual analytics
Forensic analysis of malware activity in network environments is a necessary yet very costly and time consuming part of incident response. Vast amounts of data need to be screened, in a very labor-intensive process, looking for signs indicating how the malware at hand behaves inside e.g., a corporate network. We believe that data reduction and visualization techniques can assist security analysts in studying behavioral patterns in network traffic samples (e.g., PCAP). We argue that the discovery of patterns in this traffic can help us to quickly understand how intrusive behavior such as malware activity unfolds and distinguishes itself from the rest of the traffic.In this paper we present a case study of the visual analytics tool EventPad and illustrate how it is used to gain quick insights in the analysis of PCAP traffic using rules, aggregations, and selections. We show the effectiveness of the tool on real-world data sets involving office traffic and ransomware activity